161 research outputs found
Convergence Rates of Gaussian ODE Filters
A recently-introduced class of probabilistic (uncertainty-aware) solvers for
ordinary differential equations (ODEs) applies Gaussian (Kalman) filtering to
initial value problems. These methods model the true solution and its first
derivatives \emph{a priori} as a Gauss--Markov process ,
which is then iteratively conditioned on information about . This
article establishes worst-case local convergence rates of order for a
wide range of versions of this Gaussian ODE filter, as well as global
convergence rates of order in the case of and an integrated Brownian
motion prior, and analyses how inaccurate information on coming from
approximate evaluations of affects these rates. Moreover, we show that, in
the globally convergent case, the posterior credible intervals are well
calibrated in the sense that they globally contract at the same rate as the
truncation error. We illustrate these theoretical results by numerical
experiments which might indicate their generalizability to .Comment: 26 pages, 5 figure
Uncertainty-Aware Numerical Solutions of ODEs by Bayesian Filtering
Numerical analysis is the branch of mathematics that studies algorithms that compute approximations of well-defined, but analytically-unknown mathematical quantities. Statistical inference, on the other hand, studies which judgments can be made on unknown parameters in a statistical model. By interpreting the unknown quantity of interest as a parameter and providing a statistical model that relates it to the available numerical information (the `data'), we can thus recast any problem of numerical approximation as statistical inference. In this way, the field of probabilistic numerics introduces new 'uncertainty-aware' numerical algorithms that capture all relevant sources of uncertainty (including all numerical approximation errors) by probability distributions.
While such recasts have been a decades-long success story for global optimization and quadrature (under the names of Bayesian optimization and Bayesian quadrature), the equally important numerical task of solving ordinary differential equations (ODEs) has been, until recently, largely ignored. With this dissertation, we aim to further shed light on this area of previous ignorance in three ways: Firstly, we present a first rigorous Bayesian model for initial value problems (IVPs) as statistical inference, namely as a stochastic filtering problem, which unlocks the employment of all Bayesian filters (and smoothers) to IVPs. Secondly, we theoretically analyze the properties of these new ODE filters, with a special emphasis on the convergence rates of Gaussian (Kalman) ODE filters with integrated Brownian motion prior, and explore their potential for (active) uncertainty quantification. And, thirdly, we demonstrate how employing these ODE filters as a forward simulator engenders new ODE inverse problem solvers that outperform classical 'uncertainty-unaware' ('likelihood-free') approaches.
This core content is presented in Chapter 2. It is preceded by a concise introduction in Chapter 1 which conveys the necessary concepts and locates our work in the research environment of probabilistic numerics. The final Chapter 3 concludes with an in-depth discussion of our results and their implications
On the Theoretical Properties of Noise Correlation in Stochastic Optimization
Studying the properties of stochastic noise to optimize complex non-convex
functions has been an active area of research in the field of machine learning.
Prior work has shown that the noise of stochastic gradient descent improves
optimization by overcoming undesirable obstacles in the landscape. Moreover,
injecting artificial Gaussian noise has become a popular idea to quickly escape
saddle points. Indeed, in the absence of reliable gradient information, the
noise is used to explore the landscape, but it is unclear what type of noise is
optimal in terms of exploration ability. In order to narrow this gap in our
knowledge, we study a general type of continuous-time non-Markovian process,
based on fractional Brownian motion, that allows for the increments of the
process to be correlated. This generalizes processes based on Brownian motion,
such as the Ornstein-Uhlenbeck process. We demonstrate how to discretize such
processes which gives rise to the new algorithm fPGD. This method is a
generalization of the known algorithms PGD and Anti-PGD. We study the
properties of fPGD both theoretically and empirically, demonstrating that it
possesses exploration abilities that, in some cases, are favorable over PGD and
Anti-PGD. These results open the field to novel ways to exploit noise for
training machine learning models
Validation of a Three-Item Short Form of the Modified Weight Bias Internalization Scale (WBIS-3) in the German Population
Introduction: Individuals suffering from overweight or obesity frequently experience weightbased stigmatization. The widespread belief that weight is a matter of personal will and selfcontrol results in various weight-based stereotypes (e.g., laziness, lack of self-discipline, or
neglect). Objective: Based on the modified version of the Weight Bias Internalization Scale
(WBIS-M), a short form for the economic assessment of weight bias internalization in the general population was compiled and validated. Methods: A three-item short form (WBIS-3) was
derived based on data from a representative sample of the German population (n = 1,092).
This new short form was validated in a second representative population sample (n = 2,513).
Item characteristics and internal consistency were obtained. Measurement invariance was
tested. Construct validity was established via the correlation with theoretically related constructs (depression, anxiety, eating behavior, discrimination, weight status). To establish scale
validity, all analyses were performed for the whole sample as well as for the subsample of individuals with overweight. Age- and gender-specific population norms were provided. Results: The WBIS-3 exhibited excellent psychometric properties. Internal consistency was α =
0.92. Strong measurement invariance was confirmed regarding age, gender, discrimination,
and weight status in both the whole sample as well as the overweight subsample. Conclusions: The WBIS-3 constitutes a valid and economical tool for the assessment of weight bias
internalization in epidemiological contexts. Measurement invariance allows for an unbiased
comparison of means, correlation coefficients, and path coefficients within structural equation modeling across group
An SDE for Modeling SAM: Theory and Insights
We study the SAM (Sharpness-Aware Minimization) optimizer which has recently
attracted a lot of interest due to its increased performance over more
classical variants of stochastic gradient descent. Our main contribution is the
derivation of continuous-time models (in the form of SDEs) for SAM and two of
its variants, both for the full-batch and mini-batch settings. We demonstrate
that these SDEs are rigorous approximations of the real discrete-time
algorithms (in a weak sense, scaling linearly with the learning rate). Using
these models, we then offer an explanation of why SAM prefers flat minima over
sharp ones~--~by showing that it minimizes an implicitly regularized loss with
a Hessian-dependent noise structure. Finally, we prove that SAM is attracted to
saddle points under some realistic conditions. Our theoretical results are
supported by detailed experiments.Comment: Accepted at ICML 2023 (Poster
- …